The data in this notebook is generation and consumption by fuel type for the entire US. These values are larger than what would be calculated by summing facility-level data. Note that the fuel types are somewhat aggregated (coal rather than BIT, SUB, LIG, etc). So when we multiply the fuel consumption by an emissions factor there will be some level of error.
The code assumes that you have already downloaded an ELEC.txt
file from EIA's bulk download website.
In [1]:
import json
import pandas as pd
import os
from os.path import join
import numpy as np
import sys
cwd = os.getcwd()
data_path = join(cwd, '..', 'Data storage')
idx = pd.IndexSlice
In [ ]:
file_date = '2018-03-06'
In [2]:
%load_ext watermark
In [3]:
%watermark -iv -v
In [4]:
# Load the "autoreload" extension
%load_ext autoreload
# always reload modules marked with "%aimport"
%autoreload 1
# add the 'src' directory as one where we can import modules
src_dir = join(os.getcwd(), os.pardir, 'src')
sys.path.append(src_dir)
In [5]:
%aimport Analysis.index
from Analysis.index import add_datetime, add_quarter
In [55]:
cwd = os.getcwd()
path = join(data_path, 'Raw EIA bulk', '{} ELEC.txt'.format(file_date))
with open(path, 'r') as f:
raw_txt = f.readlines()
Fuel codes:
In [56]:
def line_to_df(line):
"""
Takes in a line (dictionary), returns a dataframe
"""
for key in ['latlon', 'source', 'copyright', 'description',
'geoset_id', 'iso3166', 'name', 'state']:
line.pop(key, None)
# Split the series_id up to extract information
# Example: ELEC.PLANT.GEN.388-WAT-ALL.M
series_id = line['series_id']
series_id_list = series_id.split('.')
# Use the second to last item in list rather than third
plant_fuel_mover = series_id_list[-2].split('-')
line['type'] = plant_fuel_mover[0]
# line['state'] = plant_fuel_mover[1]
line['sector'] = plant_fuel_mover[2]
temp_df = pd.DataFrame(line)
try:
temp_df['year'] = temp_df.apply(lambda x: x['data'][0][:4], axis=1).astype(int)
temp_df['month'] = temp_df.apply(lambda x: x['data'][0][-2:], axis=1).astype(int)
temp_df['value'] = temp_df.apply(lambda x: x['data'][1], axis=1)
temp_df.drop('data', axis=1, inplace=True)
return temp_df
except:
exception_list.append(line)
pass
In [57]:
states = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE",
"FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS",
"KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS",
"MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY",
"NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]
In [58]:
state_geos = ['USA-{}'.format(state) for state in states]
In [59]:
type(json.loads(raw_txt[0]))
Out[59]:
In [60]:
json.loads(raw_txt[0])['geography']
Out[60]:
In [61]:
gen_rows = [row for row in raw_txt if 'ELEC.GEN' in row
and 'series_id' in row
and '-99.M' in row
and 'ALL' not in row]
total_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_TOT_BTU' in row
and 'series_id' in row
and '-99.M' in row
and 'ALL' not in row
and 'US-99.m' not in row]
eg_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_EG_BTU' in row
and 'series_id' in row
and '-99.M' in row
and 'ALL' not in row
and 'US-99.m' not in row]
In [62]:
gen_dicts = [json.loads(row) for row in gen_rows]
In [63]:
gen_df = pd.concat([line_to_df(x) for x in gen_dicts
if x['geography'] in state_geos])
In [64]:
#drop
gen_df.head()
Out[64]:
In [65]:
gen_df['geography'].unique()
Out[65]:
Multiply generation values by 1000 and change the units to MWh
In [66]:
gen_df.loc[:,'value'] *= 1000
gen_df.loc[:,'units'] = 'megawatthours'
In [67]:
gen_df.rename_axis({'value':'generation (MWh)'}, axis=1, inplace=True)
In [68]:
gen_df.loc[gen_df.isnull().any(axis=1)]
Out[68]:
In [69]:
gen_df.dropna(inplace=True)
In [70]:
gen_df.set_index(['type', 'year', 'month', 'geography'], inplace=True)
In [71]:
total_fuel_dict = [json.loads(row) for row in total_fuel_rows]
In [72]:
total_fuel_df = pd.concat([line_to_df(x) for x in total_fuel_dict
if x['geography'] in state_geos])
Multiply generation values by 1,000,000 and change the units to MMBtu
In [73]:
total_fuel_df.loc[:,'value'] *= 1E6
total_fuel_df.loc[:,'units'] = 'mmbtu'
In [74]:
total_fuel_df.rename_axis({'value':'total fuel (mmbtu)'}, axis=1, inplace=True)
In [75]:
total_fuel_df.set_index(['type', 'year', 'month', 'geography'], inplace=True)
Drop nans
In [76]:
total_fuel_df.loc[total_fuel_df.isnull().any(axis=1)]
Out[76]:
In [77]:
total_fuel_df = total_fuel_df.dropna()
In [78]:
eg_fuel_dict = [json.loads(row) for row in eg_fuel_rows]
In [79]:
eg_fuel_df = pd.concat([line_to_df(x) for x in eg_fuel_dict
if x['geography'] in state_geos])
Multiply generation values by 1,000,000 and change the units to MMBtu
In [80]:
eg_fuel_df.loc[:,'value'] *= 1E6
eg_fuel_df.loc[:,'units'] = 'mmbtu'
In [81]:
eg_fuel_df.rename_axis({'value':'elec fuel (mmbtu)'}, axis=1, inplace=True)
In [82]:
eg_fuel_df.set_index(['type', 'year', 'month', 'geography'], inplace=True)
In [83]:
#drop
eg_fuel_df.head()
Out[83]:
I verified on EIA's website that the values below are correct.
In [84]:
#drop
eg_fuel_df.loc[~(eg_fuel_df['elec fuel (mmbtu)'] >= 0) &
~(eg_fuel_df['elec fuel (mmbtu)'].isnull())]
Out[84]:
In [85]:
eg_fuel_df.dropna(inplace=True)
In [86]:
fuel_df = pd.concat([total_fuel_df, eg_fuel_df['elec fuel (mmbtu)']], axis=1)
Not sure how this happens in EIA's data, but we do see the negative fuel consumption for electricity generation.
In [87]:
#drop
fuel_df.loc[~(fuel_df['elec fuel (mmbtu)']>=0)]
Out[87]:
In [88]:
#drop
fuel_df.loc[~(fuel_df['total fuel (mmbtu)']>=0)]
Out[88]:
The difficulty here is that EIA combines all types of coal fuel consumption together in the bulk download and API. Fortunately the emission factors for different coal types aren't too far off on an energy basis (BIT is 93.3 kg/mmbtu, SUB is 97.2 kg/mmbtu). I'm going to average the BIT and SUB factors rather than trying to do something more complicated. In 2015 BIT represented 45% of coal energy for electricity and SUB represented 48%.
Same issue with petroleum liquids. Using the average of DFO and RFO, which were the two largest share of petroleum liquids.
In [89]:
path = join(data_path, 'Final emission factors.csv')
ef = pd.read_csv(path, index_col=0)
Fuel codes:
In [90]:
fuel_factors = pd.Series({'NG' : ef.loc['NG', 'Fossil Factor'],
'PEL': ef.loc[['DFO', 'RFO'], 'Fossil Factor'].mean(),
'PC' : ef.loc['PC', 'Fossil Factor'],
'COW' : ef.loc[['BIT', 'SUB'], 'Fossil Factor'].mean(),
'OOG' : ef.loc['OG', 'Fossil Factor']}, name='type')
In [91]:
#drop
fuel_factors
Out[91]:
In [92]:
fuel_df['all fuel CO2 (kg)'] = (fuel_df['total fuel (mmbtu)']
.multiply(fuel_factors, level='type',
fill_value=0))
fuel_df['elec fuel CO2 (kg)'] = (fuel_df['elec fuel (mmbtu)']
.multiply(fuel_factors, level='type',
fill_value=0))
In [93]:
fuel_df.head()
Out[93]:
In [94]:
fuel_cols = ['total fuel (mmbtu)', 'elec fuel (mmbtu)',
'all fuel CO2 (kg)', 'elec fuel CO2 (kg)']
gen_fuel_df = pd.concat([gen_df, fuel_df[fuel_cols]], axis=1)
Add datetime and quarter columns
In [95]:
add_quarter(gen_fuel_df)
In [96]:
gen_fuel_df.head()
Out[96]:
No records with positive fuel use but no generation
In [97]:
gen_fuel_df['generation (MWh)'].fillna(value=0, inplace=True)
In [98]:
gen_fuel_df.loc['COW',:].head()
Out[98]:
State-level
In [99]:
path = join(data_path, 'EIA state-level gen fuel CO2 2018-03-06.csv')
gen_fuel_df.to_csv(path)
National totals
In [100]:
nat_gen_fuel = gen_fuel_df.groupby(['type', 'year', 'month']).sum()
add_quarter(nat_gen_fuel)
In [101]:
nat_gen_fuel.tail()
Out[101]:
In [102]:
path = join(data_path, 'Derived data',
'EIA country-wide gen fuel CO2 {}.csv'.format(file_date))
gen_fuel_df.to_csv(path)
In [ ]: